arXiv: 1505.07215vl [stat.ME] 27 May 2015 


Modelling aggregation on the large scale and regularity on the 
small scale in spatial point pattern datasets 


Frederic Lavancier^’^ and Jesper Mpller^ 

^Laboratoire de Mathematiques Jean Leray, University of Nantes, Frederic.Lavancier@univ-nantes.fr 

^Inria, Centre Rennes Bretagne Atlantique 
^Department of Mathematical Sciences, Aalborg University, jm@math.aau.dk 


Abstract 

We consider a dependent thinning of a regular point process with the aim of obtaining 
aggregation on the large scale and regularity on the small scale in the resulting target 
point process of retained points. Various parametric models for the underlying processes 
are suggested and the properties of the target point process are studied. Simulation and 
inference procedures are discussed when a realization of the target point process is ob¬ 
served, depending on whether the thinned points are observed or not. The paper extends 
previous work by Dietrich Stoyan on interrupted point processes. 
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1 Introduction 


In the spatial point process literature, analysis of spatial point pattern datasets are of¬ 
ten classified into three main cases (see e.g. 


Waagepetersen ( 2004[ )): 

i) Regularity (or inhibition or repulsiveness)- 


Cressie 

(1993 

), 

Diggle 

(2003), and 

Mpller and 


-modelled by Gibbs point processes (Ruelle 


1969 


1986 


Lieshout] 


2000 


Chiu et ah, 2013), Matern hard core models of types I-III (Matern 


Mpller et ah, 2010|, other types of hard core processes (Illian et ah, 2008), anc 


determinantal point processes ( Macchif 1975; Lavancier et ah, 2015). 

(ii) Complete spatial randomness—modelled by Poisson point processes (Kingman 

(iii) Aggregation (or clustering)— 


1993). 


Jones 

2003D, Cox processes (|Cox 

1975 

McCullagh and Mpller 

2006 


1972), and permanental point processes (Macchi 


A popular and simplistic way to determine (i)-(iii) is in terms of the so-called pair correlation 
function (Illian et ah, 2008): Denote p and g the intensity and pair correlation functions 
for a spatial point process defined on the d-dimensional Euclidean space (with d = 2 or 
d = 3 in most applications; formal definitions of p and g are given in Section 2.1). For ease 


of presentation, we assume second order stationarity and isotropy, i.e. p is constant and for 
any locations xi,X 2 £ g{xi,X 2 ) = go{r) depends only on the distance r = ||xi — X 2 \\- 
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Intuitively, pdx is the probability that the process has a point in an infinitesimally small 
region of volume (Lebesgue measure) dx, and for r = ||xi — X 2 II > 0, go{r)p‘^ dxidx 2 is the 
probability that the process has a point in each of an infinitesimally small region around xi, X 2 
of volumes dxi, dx 2 . Typically, go{r) tends to 1 as r —)• 00 , and we are usually interested in 
the behaviour of go{r) for small and modest values of r. We expect in case of (i), go{r) < 1 
when r is small, and go{f) is less than or fluctuating around 1 otherwise; in case of (ii), go = \\ 
and typically, in case of (iii), <70 > 1 - 


For applications the classification (i)-(iii) can be too simplistic, and there is a lack of useful 
spatial point process models with, loosely speaking, aggregation on the large scale and reg¬ 
ularity on the small scale. One suggestion of such a model is a dependent thinning of e.g. 
a Poisson cluster point process where the thinning is similar to that in a Matern hard core 


process (see Andersen and Hahn (2015)) or to that in spatial birth-death constructions for 


Gibbs point processes (see Kendall and Mpller (2000) and Mpller and Waagepetersen (2004)). 
Theoretical expressions for intensity and pair correlation of such Matern thinned point pro¬ 
cesses have been derived by Palm theory, and their numerical evaluation can be obtained by 


approximations, cf. Andersen and Hahn (2015), while the spatial birth-death constructions 


are mathematical intractable. Another possibility is to consider a Gibbs point process with 
a well-chosen potential that incorporates inhibition at small scales and attraction at large 


scales. A famous example is the Lennard-Jones pair-potential (Ruelle, 1969), and other spe¬ 


cific potentials of this type can be found in Goldstein et al. (2015). Unfortunately, in general 


for Gibbs point processes the intensity and the pair correlation function are unknown and 


simulation requires elaborate MGMC methods (Mpller and Waagepetersen, 2004). 


This paper discusses instead a model for a spatial point process X obtained by an independent 
thinning of a spatial point process Y where the selection probabilities are given by a random 
process H = {n(x) : x G M'^} C [0,1]: We view X and Y as random locally finite subsets of 
and let 

A = {x G y : n(x) > U{x)] (1) 

where U = {U{x) : x G consists of independent uniformly distributed random variables 


between 0 and 1, and where T, H, U are mutually independent. Dietrich Stoyan (Stoyan, 1979 


Ghiu et ^ 2013) called X an interrupted point process, which we agree is a good terminology 
when each n(x) is either 0 or 1; indeed, in all Stoyan’s examples of applications, this is the 
case, though the general theory presented is not making this restriction. Glearly, H should not 
be deterministic, because then the pair correlation functions for X and Y would be identical 
{qx = Qy)- We have in mind that a realization of X is observed within a bounded window VF, 
while we treat (H, U) as being unobserved, and Y as being or not being observed within W. 
For instance, we can think of Y as describing an inhibitive behaviour of some plant locations 
under optimal conditions, and X as the actual plant locations due to unobserved covariates 
(e.g. light conditions, level of water underground, and quality of soil). 


Our idea is that it is possible to choose models for Y from the class (i) above together 
with models for H such that X exhibits small scale regularity and large scale aggregation. 
Some examples of simulated realizations of this kind of models are shown in Figure Our 
idea is demonstrated in Sections [3]j^ and it can be briefly understood as follows. Section |2.1| 
establishes a simple relationship between the pair correlation functions for X and Y : For 
simplicity, assume second order stationarity and isotropy of both Y and H, whereby our 
target point process X becomes second order stationary and isotropic. The (isotropic) pair 
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( 2 ) 


correlation function of X is given by 


gx,o(r) = Mo(r)gY,o(r) 


where, setting 0/0 = 0, 

Mo(r) = M{xi,X2) = 


E[n(xi)n(x2)] 


xi,x2 e 


( 3 ) 


E[n(a:i)]E[n(x2)]’ 

depends only on r = ||xi — X 2 \\- For example, if 11 is positively correlated (i.e. Mq > 1) and 
y is a determinantal point process (this process is described in Section 2.2), then gyfl < 1 
and in accordance with ([^ we may obtain a behaviour of gxfi as we wish, namely that gxfi 
is smaller respective larger than 1 on the small respective large scale. Examples appear later 
in Figure 



Figure 1: Examples of realizations within a unit square of X given by the models 1.-4. in 
Section |4.3[ In the first row E is a determinantal point process, in the second row E is a 
Matern hard core process of type II, in the hrst column — log 11 is a y^-process, and in the 
second column 11 is the characteristic function for a Boolean disc model. 


We thank Ute Hahn for reminding us about Stoyan’s interrupted point process in Stoyan 


(1979), Chiu et al. (2013), and Kautz et al. (2011). In Stoyan (1979) and Chiu et al. (2013) he 
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considered mainly the planar case where y is a Matern hard core process of type II and where 
n is the characteristic function of a motion invariant random closed set whose distribution 
apart from E[n(o)] and Mo(r) is unspecified (o denotes the origin in W^). In contrast we 
consider various models for both Y and 11, where e.g. our y^-process model for — log 11 seems 
more realistic for applications like the example studied in Section 5.1 Moreover, we discuss 
simulation and parametric inference procedures depending on whether Y is observed or is a 
latent process. Finally, we notice that the paper Kautz et al. (2011) is in another direction 
than ours, since they consider Y to be a Matern cluster process (which is of class (iii) above) 
and n to be the characteristic function for a motion invariant random closed set, i.e. X 
becomes of class (iii). 


Our paper is organized as follows. Section recalls some background material and deals with 
some inhibitive point process models for Y where py and gy^ are known, namely determinan- 
tal point processes and Matern hard core models of type I or II. Section introduces models 
for n, based on transformed Gaussian processes and Boolean models, which combined with 
the models for Y allow us to further study the behaviour of gxfl- Section discusses first 
simulation of (F, n,X), and second inference for parametric models for Y and II, depending 
on whether Y is observed or not. Finally, Section fits parametric models to examples of 
spatial point pattern datasets using the methodology from Section 


2 Preliminaries 

Let the situation be as in Section 1. This section recalls the definitions and some properties of 
product densities for a spatial point process in general and for determinantal point processes 
and Matern hard core models of types I-II in particular. 


2.1 Product densities and assumptions 

For n = 1, 2,..., suppose that py ^ i—)• [0, oo) is a Borel function satisfying the so-called 

Campbell formula 

E ^ f{xi,...,Xn) = j---jfixi,...,Xn)Py\xi,...,Xn)dxi---dXn (4) 

Xl,...,Xn.ey 

for any non-negative Borel function /, where means that xi,... ,Xn are pairwise distinct. 
Then py' is called an nth order product density of Y. Such a function is apart from a Lebesgue 
nullset uniquely determined by the Campbell formula. Henceforth, for ease of presentation, 
we ignore nullsets. In particular, py = is the intensity function. Furthermore, setting 
0/0 = 0, the pair correlation function is dehned by gy{xi,X 2 ) = PyTxi,X 2 )/[py(xi)py(x 2 )]. 
The usual practice is to set py ^(xi,... ,Xn) = 0 if Xj = Xj for some i / j. An exception is 
the case of a Poisson process Y where often one takes Py'^(xi,X 2 ) = Pr(xi)pv(x 2 ) so that 
py(xi,X 2 ) = 1 if py(xi)py(x 2 ) > 0. 

Recall that we assume for simplicity that Y is second order stationary and isotropic. We also 
assume that the first and second order intensity functions exist. Thus we can consider the 
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versions where pY is constant and where 5 y(xi,X 2 ) = 5y,o(^) depends only on the distance 
r = ||xi — X 2 ||. We call qq the isotropic pair correlation function. We furthermore assume 
/Oy > 0 (otherwise Y is empty, which is not a case of interest). 

(n) 

Similarly, and gxfl denote the nth order product density and the isotropic pair correlation 
function of X. By conditioning on Y and using Q and (Q it is straightforwardly verified that 
exists whenever py ^ exists, in which case 

Px\xi: . . . , Xn) = E[n(xi) • • • n(x„)]p^^(xi, . . . , Xn). 

Consequently, for any r > 0, 


px = qpY, ffx,o(0 = Mo{r)gY,o{r), 


(5) 


where q 
given in 


= E[n(o)] denotes the mean 
Stoyanj (1979). 


selection probability. Equation ([^ is similar to results 


For later purposes, denote X = Y \ X the complementary set of X in Y. Then, using an 
obvious notation, 


P^S\xi, ...,Xn)= E[(l - n(xi)) • • • (1 - n(Xn))]py ''(xi, . . . , Xn) 


("•)/ 


and hence by and ([^, 


n ^ ^ ^ 1 “ 2q + q'^Mo{r) 

Px = (1 - q)PY, 9x,oY) = - - 9YflY)- 


( 6 ) 


Finally, the cross pair correlation between X and X (see e.g. Mpller and Waagepetersen (2004) 
for a definition) is given by 


q - q^Mo{r) 


9xxfl{r) = - -ffy,o(0- 


(7) 


2.2 Determinantal point processes 


Let C be a complex function dehned on and E be a determinantal point process (DPP) 

with kernel C. By definition this means that for any n = 1,2,... and any xi,..., x^ € 
pppxi,..., Xn) exists and is equal to the determinant of the n x n matrix with (i, j)th entry 
C{xi,Xj). For background material on DPPs, including conditions for their existence, see 


Lavancier et ah (2015) and the references therein. 


For simplicity and specificity, we assume that C is a stationary and isotropic covariance func¬ 
tion, i.e. C'(x, y) = Codlx—?/||) is real and non-negative definite. Clearly, Y is then a stationary 
and isotropic DPP, and we write Y ~ DPP(C'o). We also assume that CodlxH) is continuous 
and square integrable, i.e. r'^~^\Co{r)\‘^ dr < oo. By Theorem 2.3 and Proposition 3.1 in 


Lavancier et ah (2015), the existence of DPP(C'o) is then equivalent to that cpo < 1, where 


Po{r) = 


2 C'o(s) cos(27rrs) ds if d = 1 

/o“ C'o(s) Jrf/2_i(27rrs)s'^/2 if d = 2, 3,... 
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is the spectral density associated to Cq and Jd/ 2-1 is the Bessel function of order d/2 — 1. 
Observe that 

Py = C'o(O) (8) 

which is assnmed to be strictly positive, and 

gY,o{r) = 1 - CQ{rflp\ (9) 


is 1 minus the squared correlation function. This implies that g'y < 1 which is one among 


many other properties confirming that a DPP is inhibitive, cf. Lavancier et al. (2015) and 


Biscio and Lavancier (2015). 


2.3 Matern hard core models of types I-II 


Following Matern (1986) we define hard core point processes as follows. Let <1> be a stationary 
Poisson process on with intensity p# > 0, D > 0 a hard core parameter, and V = {V{x) : 
X G a random process of independent uniformly distributed random variables between 
0 and 1, where and V are independent. Denote cod = 7r'^/^/r(l + (i/2) the volume of the 
d-dimensional unit ball, and kd{r, D) the volume of the intersection of two d-dimensional 
balls of radii D and distance r between their centres. The Matern hard core model of type I, 
denoted 1/, is given by the points in $ which are not D-close to some other point in <h, i.e. 


Y/ = {x G : ||x — y\\ > D whenever y G \ {x}}. 

For the Matern hard core model of type II, denoted Y//, we interpret V{x) as the birth time 
of X and let Y// consist of the points x G such that no other D-close point in <I> is older 
than X, i.e. 


Y// = {x G : ||x — y|| > D whenever y G <I> and Y(x) > Y(y)}. 


These hard core point processes are stationary and isotropic with intensities 

[ ^d\ 1 - exp {-p^uJdD<^) 

PY, = P^ exp [-pq,UJdD j , pYi, = --, 

and pair correlation functions 

gYi,o{r) = l(r > D) ex.p{pq>kd{r, D)) 

and 

2uJdD^ 


gY,,fl{r) =l[r > D] 


1 - 


{uJdD'^ -kd{r,D)){l- exp {-pq^UdD^^)) 

1 - exp (-p$ {2uJdD^ - kd{r, D))) 
{2uJdD<^ -kdir,D)){l- exp {-p^uJdD^))^’^ 


( 10 ) 


( 11 ) 


( 12 ) 
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where l(-) is the indicator function. Note that 1/ C Yu and pYj < PYu- The pair correlation 


functions in (11)-(12) are continuous except at r = D, 0 when r < D, strictly decreasing for 


r €]D, 2D[, and 1 when r > 2D. Finally, note that kd{r, D) = 0 ii r > 2D, and 


kd{r, D) = < 


2D-r 

2D^ arccos (^) 

47r 7-)3 (-[ _ i 
3 4D T J 




if d = 1 
if d = 2 
if d = 3 


(13) 


if r < 2D. 


3 Specific models for the selection probabilities 


consider two 

classes of models for 11 where explicit expressions for our main characteristics {q, px,Mo, gxfi) 
are available. 


Section 3.1 discusses the implications of Q in general, while Sections 3.2||3.3 


3.1 General results and conditions 

In the remainder of this paper, to exclude non-interesting cases, we focus on the following 
situation. We always assume that n(o) has a positive variance or equivalently that Mo(0) > 1, 
since we do not want II to be deterministic. In addition, we always assume that px > 0 (or 
equivalently py > 0 and > 0), because otherwise X would be almost surely empty. Since it 
is typically the case that an isotropic pair correlation function tends to 1 as the distance tends 
to infinity, we want Mo(r) to tend to 1 as r —)■ oo, cf. ([^. Therefore we are not so interested 
in the case where n(x) does not depend on the location x, since then Mq is a constant > 1 
and n is deterministic if Mq = 1. 

We have gx,o = gvfi if and only if II is uncorrelated, cf. (§ and (§. If n is non-negatively 
correlated, i.e. Mq > 1, then gx,o > 5 y, 0 ) so gyfl cannot cross 1 before gx,o crosses 1. If II is 
positively correlated, i.e. Mq > 1, then gx,o > 9 y,o- If II can be negatively correlated, a rather 
peculiar behaviour of gx,Q may happen and we shall exclude this case in our specific models. 

By Cauchy-Schwartz inequality and since 11^ < II, we have for r = ||xi — X 2 II > 0, 

, E|n(x,)nfe)] ^ v'E|n(i,)2]v'E[n(i.)2] ^ v'E|n(ii)]yE|n(i.)] i 

Afc(r) =- ^5 - < - 

Combining this with ([^, we obtain an upper bound: gx,o < gYfliQ- 

Dehne ry = supjr > 0 ; gY,oi^) = 0 whenever r < r}. When ry > 0 we say that T is a 
hard-core process with hard-core parameter ry. Assume that Mo(r) > 0 for all r > 0; this is 
satisfied for all the models of II specified later in this paper. Then tx = ty, cf. ([^. Hence X 
is a hard-core process if and only if T is a hard-core process. 

At the small scale, i.e. when r < r where r > 0 is a sufficiently small constant, we have the 
following. 
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(a) Assume Mq is continuous. Since Mo(0) > 1, we can assume that Mo(r) > 1 for r < r. 
Hence, by 0, for r < r, either gxfi{r) > gvflir) or gx,o{r) = gvfiir) = 0. 

(b) Assume both Mq and gvfl are continuous. Then gxfl is continuous, and so we can assume 
that gY,o < gxfl{'>') < 1 for r < r. Consequently, at distance r < t, the inhibitive 
behaviour of Y (quantihed in terms of its pair correlation function) is preserved in X 
but it cannot be stronger. 

In brief we will be interested in models where H is positively and not too weakly correlated 
at the small scale. 


At the large scale, basically the properties of gx depends on q and the range of correlation 
of n. If q is large, then since gx,o < 9y,o/q we may have gx < 1, meaning that no clustering 
is created by the thinning process; an obvious example is when gy^ < 1, as e.g. in a DPP. If 
q is sufficiently small, then sup^5(x(?’) > 1 occurs in our examples of models, and we expect 
this to be the situation in many other cases. However, it is not true that there always exists 
q such that sup^ gx{r) > 1. An obvious counterexample is when H is uncorrelated; other 
counterexamples may be constructed when the variance say, of H is such that cP' jq^ —?• 0 
as g —)• 0. On the other hand, assume q is fixed and H is non-negatively correlated, then it is 
always possible to get sup^ > 1 by increasing the range of correlation of H, i.e. making 


Mo(r) > 1 for r sufficiently large. This is exemplified in Sections 3.2 3.3 


3.2 Transformed Gaussian processes 


This section assumes — log H is the y^-process given by 


n(x) = exp 



X G M'^, 


(14) 


where Zi = {Zi(x) : x G M*^}, i = 1,... ,k, are i.i.d. zero-mean real Gaussian processes with 
covariance function AT : x i—>■ M. 

A straightforward calculation yields E[n(x)] = 1/(1 + K{x,x))^P and 


M{xi,X2) 


_(1 + K{xi,Xi)){l + K{x 2 ,X 2 )) _ 

(1 A'(xi,xi))(l -h K{x2,X2)) - |A(xi,X2 )P 


Hence, for K{xi,xi)K{x 2 , X 2 ) > 0 and defining i?(xi, X 2 ) = K{xi,X 2 )/\/K{xi,xi)K{x 2 ,X 2 ), 


M{xi,X2) 


_ R{xi,X2f _ 

(1 + 1/A:(xi,xi))(1 + 1/K{x2,X2)) 


is an increasing function of \R{xi,X 2 )\,K{xi,xi),K{x 2 ,X 2 ), respectively. 

In the sequel we assume stationarity and isotropy of K(xi,X 2 ) = ATodl^^i — a^ 2 ||)) whereby H 
is stationary and isotropic. Defining k = Kq{o) and Rq = KqIk, we notice as the variance k 
increases from zero to infinity that 


q=l/{l + K)P^ 


(15) 
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decreases from 1 to 0, while for fixed r, 


Mo(r) 


Roir? 1 

(1 + 1/k)2_ 


-k/2 


1 - {1 - 


(16) 


increases from 1 to [1 — -Ro(^)^] Thus there is a trade-off between how large q = px!P y 
and Mo(r) = gxfl{r)/gY,o{r) can be. 


We have that Mo(r) > 1 is a decreasing function of k and Mo(r) —>• 1 as /c —>• oo, showing that 
taking a large value of k is not appropriate if we want X to exhibit a clustering behaviour at 
the large scale. Further, assume that the correlation function Rq depends on a scale parameter 
s > 0, i.e. for all r > 0, Ro{r) = RQ{r,s) = Ro{r/s) where Ro{r) = Ro{r, 1). This is so for 
most parametric models of covariance functions used in spatial statistics. Then, for any given 
q G (0,1) and A; > 1, we have M^ir) —)• [1 — (1 — g 2 /fc^ 2 j-A :/2 ^ as s —)• oo provided Rq is 
continuous at the origin. This combined with Q proves that X will necessarily exhibit some 
clustering behaviour at the large scale when s is sufficiently large. 


The effect of the parameters is illustrated in Figure which shows the pair correlation of X 
when d = 2 and Y is either a DPP or a type II Matern hard core process. Specifically, the first 
row of Figure corresponds to the case where T is a DPP with a Gaussian kernel Co{r) = 
Py exp(—(r/0.015)^) and py = 1000, while in the second row D is a type II Matern hard core 
process with D = 0.015 and p^ = 1736 whereby py = 1000. The selection probabilities are 
given by (14) where Kq is a Gaussian covariance function with scale parameter s. A joint 
realization of the restrictions of Y, 11, and A to a unit square is shown on the left hand side 
of Figure]^ 


3.3 Boolean and complementary Boolean models 


This section specifies further models for the selection probabilities. 

Let T be a stationary Poisson process on with intensity p^, and conditional on T, let Aq 
and Ax for all x G T be i.i.d. positive random variables with a distribution which does not 
depend on 'I' and so that E[Ag] < oo. Denote H the stationary Boolean model given by the 
union of the d-dimensional balls centred at the points of 4' and with radii A^,, x G 4'. Recall 
that p = P{o G H) is the volume fraction, and for x G and r = ||x||, C's(r) = P{{o, x} C H) 
is the so-called covariance function, where expressions for p,Csi'i"), and the void probability 
P{{o,x} n H = 0) are known (see e.g. Molchanov (1997)). 

Specifying 11 by the characteristic function of the random set H or its complement in 
either case X becomes stationary and isotropic: First, if 


Molchanov (1997 


n(x) = l(x G H), 


then 

q = p = l- exp ( 

and since E [n(o)n(x)] = Cs(||x||), we obtain 


Ad 

^0 


Mo{r) = ‘^ - ^ + 
p p^ 


1 -P 
P 


exp{p^E[kdir, Aq)]) . 


(17) 

(18) 


(19) 
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Figure 2: Pair correlation functions of V (dashed line) and X (solid lines) when y is a DPP 
with a Gaussian kernel (first row) or a Matern hard core process of type II (second row) and 
— log n is the x^-process given by (14) when Kq is a Gaussian covariance function with scale 
parameter s. First column: k = 1, s = 0.05, and from top to bottom q = 0.4, 0.5,0.6, 0.7, 0.8; 
second column: q = 0.5, s = 0.05, and from top to bottom k = 1,2,3,5,10; third column: 
q = 0.5, k = 1, and from top to bottom s = 0.5,0.2, 0.1, 0.05, 0.03. 


Second, if n(x) = l(x 0 H), then by Q, 


q = 1 — p = exp 





( 20 ) 


and 


Mo(r) = exp {p^E [kair, Aq)]) 


( 21 ) 


Equations (18)-(21) become explicit in the particular case of a fixed deterministic radius 


Ao > 0. When Aq is random, E [kd{r, Aq)] may be evaluated by a numeric method using (13). 
We consider later the case where Aq follows a Beta-distribution with parameters a and /3; 
then E [Aq] = B{a + d, f3)/B{a, (3) is given in terms of the beta-function. 


Note that MQ{r) —)• 1/q > 1 as E [Aq] —?• oo, showing that X will necessarily exhibit some 
clustering behaviour at the large scale if the Boolean model has large radii. The pair correlation 
function of X is represented in Figure]^ for different values of the parameters in the situation 
where Y is either a Gaussian DPP or a type II Matern hard core process as in Figure and 
n is given by 0 with a deterministic radius Aq. A joint realization of the restrictions of Y, 
n and A to a unit square is shown on the right-hand side of Figure 


Finally, we notice that another tractable model for Ft is the characteristic function for a 
random closed set given by the excursion set for a Gaussian process, where a relation between 
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Figure 3: Realization of Y (union of all points) and X (black points) within a unit square 
when y is a Matern hard core process of type II with py = 1000 and D = 0.015. Left: 
Corresponding realization of 11 (the grayscale from white to black corresponds to values from 
0 to 1) when — log 11 is the x^’P^cess given by (14) when k = 1 and Kq is the Gaussian 
covariance function with scale parameter s = 0.1. Right: Corresponding realization of 11 
represented by the characteristic function for the union of gray discs within the unit square 
and specified by the Boolean disc model with fixed radius Aq = 0.05, cf. For both plots 
q = 0.5. 


Mq and the covariance function of the Gaussian process can be established, see Chiu et al. 
(2013) and the references therein. 


4 Simulation and inference 


4.1 


In the sequel VF C denotes a bounded region (e.g. an observation window). Section 
concerns simulation of (R, 11, X) on W and conditional simulation of 11 given a realization 
of X on IF. Section |4.2| deals with parametric inference methods depending on whether we 


observe both X and R on IF or only X on W, and Section |4.3| discusses a simulation study 
for these two cases. 


4.1 Simulation and conditional simulation 


Simulating X within IF is straightforward from its definition 0 as long as we are able to 
simulate the restrictions of R and 11 to VF. Concerning our examples of R, an algorithm 
to generate a DPP within a rectangular window is detailed in [Lavancier et ah (2015) while 
a Matern hard core process of type I or II is easily simulated within any bounded region. 


For both models, some simulation routines are available in the spatstat library (Baddeley 
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Figure 4: Pair correlation functions of Y (dashed line) and X (solid line) when y is a DPP 
with a Gaussian kernel (first row) or a Matern hard core process of type II (second row) and 
n is the Boolean inclusion probability model given by 0 with a deterministic radius Aq. 
First column: Aq = 0.03 and from top to bottom q = 0.4,0.5, 0.6,0.7,0.8; Second column: 
q = 0.5 and from top to bottom Aq = 0.1, 0.05, 0.03, 0.02, 0.015, 0.01. 


and Turner, 2005) of R (R Core Team, 2014). Concerning R, simulating the model in Sec¬ 


tion 3.2 amounts to simulate a centered Gaussian process with prescribed covariance function, 
which is for instance implemented in the RandomFields library (Schlather et ah, 2015), while 
generating a Boolean disc model for the example of Section |3.3| is straightforward. 

Suppose we have fitted a model for 11, based on the observation of A on IT (e.g. using the 
method described in Section 4.2), and we are interested in the conditional simulation of 11 
(possibly restricted to W) given the observed point pattern Xyt/ = xw- This amounts to 
simulate according to the distribution of 11 given Xy/ = xw- The conditional distribution of 
Xw given 11 and Yw admits the probability mass function 


p(xvy|n, Yw) 


l{xw C Yw) 



( 22 ) 
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so the conditional distribution of 11 given Xiy = xw is 


P(n G F\Xw = x\y) oc E 


l(n G E, Xw C Yw) 



n (i-n(^^)) 


(23) 

where the constant of proportionality depends only on xw- The conditional simulation of 
n given Xw = xw would thus require some Monte Carlo based algorithm such as the 


Metropolis-Hastings algorithm in order to approximate the expectation in (23). This is in 


general prohibitively time consuming and we do not consider this conditional simulation in 
the following. 

A simpler setting occurs when both X and Y are observed on W. Let Xw = Yw \ Xw- Since 
Y and II are independent, the conditional distribution of IT given Xw = xw and Xw = xw 

is 


P(n G F\Xw = xwi Xw = Xw) oc E 


i(nGE) J] n(«) n (i-n(^)) 






(24) 


The expectation in (24) is simpler than that in (23) but in general some Monte Carlo based 


algorithm is still needed for conditional simulation. We detail two convenient situations below. 

The first case occurs when II is given by the Boolean model (0- Then simulating according to 
(24) just reduces to the conditional simulation of a Boolean random set E given that xw ^ ^ 


and Xw n E = 0. This case of conditional simulation is well known, see Lantuejoul (2002) 


In the second case, — log II is the square of a stationary and isotropic Gaussian process given 


by (14) with k = 1. Then conditional simulation of II based on (24) amounts to generate 
the Gaussian process Z given Xw = xw and Xw = xw-, which can be conducted in two 
steps. In the first step, as described below generate (Z(?/i),..., Z(y„)) given that Xw = xw-, 
Xw = Xw-, and Yw = xw U xw = {yij • • • -,yn}-, say. In the second step, simulate Z on IE, 
conditional on the values of Z{yi), i = 1,... ,n, generated in the first step. This second step 


can be done by double kriging as explained in Lantuejoul (2002) and this is implemented 


in the RandomFields library of R. For the first step, denote the number of points in xw by 
rix = n{xw), and similarly let = n{xw) so that n = Ux + Ux, and let P be the n x n 
matrix with generic element iLo(||?/i — yj\\)- Assuming P is invertible, we deduce from (24) 
that the target law admits a density in M”" of the form /(z) = cexp(— t/(z)), where c > 0, 
z = {zi,.. -,Zn), and 

.71x71 . 

U{z) = - '^zf - log(l-e"^^*") +-zP-^z, 


2=1 


i=nx-\-l 


where z' is the transpose of z. The conditional simulation of {Z{yi), ..., Z{yn)) can then be 
done by a Metropolis within Gibbs sampler as follows, where AA(0, k) denotes the centered 
normal distribution with variance k = iLo(O): 

1. generate z = (zi,..., z„) as n independent A/’(0, K)-distributed random variables; 

2. for i = 1 to n 


let Zi ~ Af{0, k), z = (zi,... ,Zi-i,Zi,Zi+i ,... ,Zn), and 6 = U(z) - U(z)] 


13 



















if (5 < 0 then set z = z; 

if (5 > 0 then with probability exp(—5) set z = z; 

3. repeat 2. and start sampling when the chain has effectively reached equilibrium. 


4.2 Inference methods 


First, assume that we observe both = xw and IV = Vw-, with xw ^ Vw-, and let 
xw = Vw \ Xw- Fitting a parametric model for Y in this setting is a standard problem of 


spatial statistics; see Lavancier et al. (2015) if y is a determinantal point process; or Illian et al. 


(2008) if y is a Matern hard core point process of type I or II. For estimation of parameters 


related to 11, assume that Mo(r) = Mo(r|g, 0n) apart from q depends on a parameter %. A 
natural idea is to base the estimation on the conditional distribution of Xw given Yw = ywj 
which has probability mass function 


p{xw\yw) = E 


n n(«) 


n (i-n(^)) 

v&yw\xw 


(25) 


Since (25) is in general intractable, we consider instead composite likelihoods for marginal 


distributions of Xw given Yw = yw-, noticing that conditional on Yw = Vw, 

• a point u G yw is in xw with probability E[n(ti)] = q, and in xw with probability 1 — q, 

• for a pair of distinct points {«, u} C 

— {u, u} C Xw with probability E {n(rt)n(u)} = q^MQ{\\v — u\\\q, %), 

— {u,v} C Xw with probability E{(1 — n(ti))(l — n(u))} = 1 — 2q + g^Mo(||u — 

— u G Xw and v G xw with probability E {n(ri)(l — n(u))} = q — q‘^MQ{\\v — u\\ |g, 0n)- 

Conditional on Yw = yw, we define the first order composite likelihood CLi{xw\yw',Q) as 
the product of the marginal selection/deletion probabilities for each of the points in yw, i-e. 


CLi{xw\yw;q)=q^^^^Hl-qr^^'^\ 


(26) 


and the second order composite likelihood CL 2 {xw\yw',q,(^n) by the product over all un¬ 
ordered pairs of points in yw, considering the probability whether those points have been 
retained or deleted, i.e. 


CL2{xw\yw',q,(^n) 


n q‘^Mo{\\v - u\\\q,en) 

{u,v}Cx^r 


n {l-2g + 9^Mo(||u-u|||gr,6»n)} 


{q-q‘^Mo{\\v-u\\\q,eu)} 

{UjtljCxyi/ 




Maximizing (26) yields the natural estimate 


q = n{xw)/n{yw)- 


(27) 


(28) 
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Inserting this into (27), the maximization of CL 2 {xw\yw'-,Q-,&n) then provides an estimate 
for the remaining parameter On. 


Second, assume that we observe Xw = xw and we want to fit a parametric model for Y 
and n based on this observation. The likelihood of is given by the mean value of the 
conditional density (22) with respect to the distribution of (n,y) on W. This mean value 
makes likelihood inference infeasible unless we use elaborate Monte Carlo procedures. Instead 
we consider estimation based on the intensity and pair correlation function for X. Here one 
possibility is composite likelihoods (see Mpller and Waagepetersen (2007) and the references 
therein) and another is minimum contrast estimation procedures. Below we concentrate on 
the latter. 


Assume that (7yo(^) = dYfliA^Y) depends on a parameter Oy and as before Mo{r) = Mo(r|g, On) 
depends on q and %. A natural and unbiased estimate of the intensity px is px = n{xw) /I bC|; 
i.e. the observed number of points divided by the Lebesgue measure of W. Given an estimate 
q of g, the relation ([^ provides the estimate py = qpx of py. The estimation problem thereby 
reduces to estimating {q, 9y, 9n)- By ([^, this can be achieved by minimum contrast estimation 
based on the pair correlation function of X: 

fTu 

{q, 9y, On) = aigmin {Mo{r\q,9uTgY,o{^\0yy - gxpi'^TV ( 29 ) 

q,9Y,dn Jri 

where 0 < n < and c > 0 are user-specihed parameters and gxp is a non-parametric kernel 
estimate of gxfi based on the data xw (we use the default estimate provided by spatstat). 
For a rectangular observation window W with minimal side length we chose after some 
experimentation, c = 1, r; = £/100 and = £/4. 


Alternatively, Ripley’s A'-function can be used instead of the pair correlation function in 
(29), where we choose n and as above but let c = 0.5 (following Diggle (2003)). For 


the models considered in this paper, the theoretical iF-function, given for d = 2 by iF(r) = 
27r Jq t 5 rx,o(^)dt, has to be approximated by numerical methods. 

Moreover, the minimum contrast estimates obtained from the pair correlation and the K- 


function can be combined to provide a better estimate. We refer to Lavancier and Rochet 


(2015) for details and consider just the example of two estimators qg and qx for q. Then the 
idea is to seek the weights (Ai, A 2 ) G with Ai + A 2 = 1 such that the linear combination 
AiQg + \ 2 gK has a minimal mean square error. The solution is (Ai, A 2 )''^ = S“^l/(l''^S“^l), 
where S is the mean square error matrix of {qg^qx) and 1 = (1, l)"*^. An adaptive choice is 
obtained by replacing S by an estimate S in the previous formula, where S can be obtained by 
parametric bootstrap. This ‘average’ approach may also be used to combine several estimates 


for different values of c in (29). From our experience, this does not improve significantly on our 


basic choice of c suggested above and we do not consider this generalization in the following. 


4.3 Simulation study 


We carried out a simulation study for the following four models when d = 2 and IT is a unit 
square: 


1. T is a DPP with Gaussian kernel G 


o{r) = 


pye where py = 1000 and a = 


0.015, and — logH is a squared zero-mean Gaussian process given by (14) with k = 1, 
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correlation function -Ro(^) = e where s = 0.05, and variance k deduced from (15) 

with q = 0.5. 

2. y is a DPP as above and 11 is the characteristic function for the Boolean model given 
by ((TtI) with deterministic radius Aq = 0.05 and where pq/ is deduced from ( |18[ ) with 
q = 0.5. 

3. y is a Matern hard core process of type II with hardcore distance D = 0.015 and 


= 1736 yielding pYjj = 1000, cf. (10), while 11 is as in model 1. 


4. y is as in model 3., and 11 is as in model 2. 

In each case, 100 independent realizations of X were generated on the unit square. Some 
examples are shown in Figure 

First, we assumed that X and Y are both observed. We did not fit a parametric model for 


y, which is a standard inference problem as explained in Section 4.2 but we estimated q and 
0n by the composite likelihood method detailed in the same section, where 0n = s in models 
1. and 3., and 0n = Aq in models 2. and 4. The value /c = 1 in models 1. and 3. was assumed 


to be fixed. Since the estimation of q in this setting is easy, see (28), we only report in Table 1 


some summary statistics for Ojj. The results demonstrate good performances of the maximum 
composite likelihood estimator. 

Second, we assumed that only X is observed. The hardcore distance D in models 3. and 
4. was then estimated by the minimal pairwise distance observed in X, the value A: = 1 in 
models 1. and 3. was assumed to be fixed, and the other parameters were fitted as explained 


in Section 4.2 either from the pair correlation function, or from the ii'-function, or from an 


optimal linear combination of the former and the latter. The performances of the estimators 
are summarized in Table 2 except for px and D which are standard estimators. For the first 
model, the estimation of s from the iF-function sometimes failed because the optimization 
procedure did not find a minimum. In those circumstances, the figures in Table 2 marked 
with an asterisk are computed from only 90% of the simulated point patterns. Overall, the 
estimation based on the pair correlation function g seems more reliable than the estimation 
based on iF, cf. Table 2. The average estimator {AV) based on an optimal linear combination 
always outperforms the two previous methods in terms of the mean square error. The weights 
used for the combination are reported in Table 2. 


Mean 

Model 1 
sd 

(D 

MSE 

Model 2 
Mean sd 

(Ao) 

MSE 

Mean 

Models (s) 
sd MSE 

Model 4 (Ao) 

Mean sd MSE 

0.05 

0.006 

3.69x10"® 

0.05 0.004 

1.59x10"® 

0.05 

0.005 3.05x10"® 

0.05 0.004 1.56x10"® 


Table 1: Empirical means, standard deviations (sd), and mean square error (MSE) of maxi¬ 
mum composite likelihood estimates for the parameters of the model for 11, when X and Y 
are observed within a unit square. The results are based on 100 simulated datasets for each 


of the four models of Section 4.3 
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Model 

Mean 

9 

sd 

MSE 

Mean 

K 

sd 

MSE 

Mean 

sd 

AV 

MSE 

Weight 

1 

? 

0.48 

0.15 

0.023 

0.49 

0.21 

0.045 

0.48 

0.14 

0.021 

(0.8,0.2) 


a 

0.016 

0.0030 

9.6x10-® 

0.014 

0.0039 

15.6x10“® 

0.015 

0.0028 

8.1x10“® 

(0.7,0.3) 


s 

0.053 

0.018 

3x10--* 

0.059* 

0.049* 

24*xl0““ 

0.053 

0.018 

3x10““ 

(1,0) 

2 

q 

0.52 

0.055 

0.003 

0.50 

0.116 

0.013 

0.52 

0.055 

0.003 

(1,0) 


a 

0.015 

0.0014 

2x10"® 

0.015 

0.0038 

14x10“® 

0.015 

0.0014 

2x10“® 

(1,0) 


^0 

0.052 

0.017 

3x10--* 

0.056 

0.034 

12x10““ 

0.052 

0.017 

3x10““ 

(1,0) 

3 

q 

0.66 

0.05 

0.029 

0.51 

0.16 

0.025 

0.58 

0.09 

0.015 

(0.4,0.6) 


S 

0.07 

0.04 

0.0019 

0.08 

0.09 

0.0098 

0.07 

0.04 

0.0019 

(1,0) 

4 

q 

0.56 

0.050 

0.0066 

0.50 

0.077 

0.0058 

0.53 

0.061 

0.0045 

(0.5,0.5) 


^0 

0.058 

0.009 

1.5x10““ 

0.052 

0.022 

4.9x10““ 

0.058 

0.009 

1.5x10““ 

(1,0) 


Table 2: Empirical means, standard deviations (sd), and mean square error (MSE) of param¬ 
eter estimates based on 100 simulated datasets for the four models of Section when only 
X is observed within a unit square and different minimum contrast estimation procedures are 
used. 


5 Data examples 

This section illustrates how our statistical methodology applies for two real datasets when Y 
is observed (Section |5.1|) or not (Section |5.2[). 


5.1 Allogny dataset 


Figure shows the position of 910 oak trees in a 125 x 188m region at Allogny, France, 
where the 256 solid points correspond to ’’splited oaks”, damaged by frost shake, and the 654 
remaining trees (’’sound oaks”) are represented by small circles. This dataset is available in 



a clustered phenomenon, as the empirical pair correlation function of the splited oaks in 
Figure may suggest. To the best of our knowledge, a parametric model for the dataset has 
yet not been proposed and analyzed. 


We apply our model to this dataset where X represents the splited oaks and Y is the unmarked 
point pattern composed of the splited and the sound oaks. In this application, the inclusion 
probabilities given by IT have a natural interpretation in terms of unobserved environmental 
conditions that locally favor frost shake. Specifically, following the procedure explained in 
Section |4.2[ we fit a parametric model to 11 by the composite likelihood method. We are in 
particularly interested here by the conditional simulation of 11 given the observation of the 
sound oaks and the splited oaks. 


Both models presented in Sections 3.2 3.3| can be considered for IT. However we think that 
a Boolean model is too simple to explain the clustering behaviour of splited trees and we 
therefore assume that — log H is a squared stationary and isotropic Gaussian process given by 
(14) with k = 1 and Rq being a Whittle-Matern correlation function with shape parameter 
v > 0 and scale parameter s > 0 (see Lavancier et al. (2015)). The estimate oi q is q = 


256/910 = 0.28 whereby (15) gives k = 11.8, and by maximizing (27) using a grid of values 
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Figure 5: Left: Allogny dataset of sound oaks (circles) and splited oaks (solid points). Middle: 
splited oaks only. Right: non-parametric estimate of the pair correlation function for all trees 
(solid curve), the sound oaks (dotted-dashed curve), and the splited oaks (dashed curve). 


for V and substituting q by q, we obtain s = 7.8 and z> = 0.5, in which case Rq becomes the 
exponential correlation function. The goodness of fit is assessed by comparing non-parametric 
estimates of the K, F, G, and J functions (see e.g. Mpller and Waagepetersen| ( 2004| )) for the 
splited oaks with 95% pointwise envelopes of the same functions obtained from simulations 
of the fitted model. Here, in accordance with our inference procedure, the simulation of new 
splited oaks is done conditionally on the tree locations, meaning that only H is simulated. 
The results are reported in Figure]^ Furthermore, the same comparison is done for the sound 
oaks in Figure Figures [6]j^ show that the goodness of fit is acceptable. 



Figure 6: From left to right non-parametric estimates of the K, F, G, and J functions for the 
splited oaks (solid lines) along with simulated 95% pointwise envelopes obtained under the 
fitted model of Section |5.1 (dashed lines). 


Next, assuming H follows the fitted model, we simulate H conditional on the splited oaks and 
the sound oaks, using the two steps procedure detailed in Section O Figure shows two 


such realizations of H and an approximation of the conditional expectation obtained from the 
average over 100 independent realizations of H. The white and lighter areas in these grayscale 
plots correspond to regions where frost shake seems unlikely to happen. The two simulated 
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Figure 7: From left to right non-parametric estimates of the iF, F, G, and J functions for the 
sound oaks (solid lines) along with simulated 95% pointwise envelopes obtained under the 
fitted model of Section 5.1 (dashed lines). 


realizations illustrate the ‘roughness’ of II due to the underlying exponential covariance func¬ 
tion. As expected, the conditional expectation of II is large in the neighborhoods of splited 
oaks. 



Figure 8: Typical simulations (left and middle) and approximated expectation (right) of II 
under the fitted model of Section 


5.1 


conditional on the splited oaks and the sound oaks. 


5.2 Ponderosa dataset 


Figure shows the location of 108 ponderosa pine trees in a 120 x 120m area of the Klamath 
National Forest in Northern California. This dataset is available in the spatstat library and 
was studied in |Getis and Franklin (1987). By a descriptive second-order analysis, the authors 
identified different types of clustering between the trees, depending on the scale. In particular 
they noticed that “there are clusters of points and an apparent inhibition effect”. 


We fit our model to this dataset where Y is assumed to be a DPP and II to follow (14) with a 


Gaussian covariance function or ([T^ where Aq is deterministic. Three parametric families of 
kernels were considered for the DPP: the Gaussian covariance functions, the Whittle-Matern 
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Figure 9: Left: Ponderosa dataset. Right; Non-parametric estimate of its pair correlation 
function. 


covariance functions (see Lavancier et al. (2015)), and the Bessel-type covariance functions 


(see Biscio and Lavancier (2015)). These covariance functions depend on the intensity py 
(which is equal to the variance), on a scale parameter a and for the Whittle-Matern and 
Bessel-type covariance functions on an extra shape parameter, denoted v and d, respectively. 
The Gaussian kernels family is in fact a limiting case of the two others families when i/, 
respectively cr, tends to infinity. It is well known that the identification of both a and v 
(respectively u) is difficult, even when Y is fully observed and not thinned by IT. To estimate 


all parameters, we use a minimum contrast method as explained in Section 4.2 for different 
values of v (respectively d) on a grid and then choose the parameters giving the minimal 
value of the contrast function. 

Among all fitted six models, we selected the one associated to the minimal value of the contrast 


function based on the pair correlation function, cf. (29). The best fit was then obtained for 


Y being the DPP with a Gaussian covariance function and II the Boolean model, where the 
minimum contrast procedure together with (18)- (@ give the estimates a = 5.26 (with a 
being the scale parameter of the Gaussian covariance function), q = 0.74, and Aq = 23.14. 
Further, q = 0.74 together with the natural estimate px = 108/(120)^ = 0.0075 give py = 
px/^ = 0.01. When trying to improve the estimation of the selected model by the combination 


method based of the g and K functions (described at the end of Section 4.2), the best weight 
was (1,0), thus confirming the choice of the contrast function based on g for this model. The 
goodness of ht is assessed by comparing the non-parametric estimates of the K, F, G, and 
J functions based on the data with 95% pointwise envelopes of the same functions obtained 
from simulations of the fitted model. The result is shown in Figure [T0| where the htted model 
appears to provide a good ht. 


Acknowledgment 

Supported by the Danish Gouncil for Independent Research — Natural Sciences, grant 12- 
124675, ’’Mathematical and Statistical Analysis of Spatial Data”, and by the ’’Centre for 
Stochastic Geometry and Advanced Bioimaging”, funded by grant 8721 from the Villum 


20 

























Figure 10: From left to right non-parametric estimation of the iF, F, G, and J functions for 
the Ponderosa dataset (solid lines) along with simulated 95% pointwise envelopes under the 
fitted model of Section 5.2 (dashed lines). 
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